A Closer Look at the Closest String and Closest Substring Problem

نویسندگان

  • Markus Chimani
  • Matthias Woste
  • Sebastian Böcker
چکیده

Let S be a set of k strings over an alphabet Σ; each string has a length between ` and n. The Closest Substring Problem (CSSP) is to find a minimal integer d (and a corresponding string t of length `) such that each string s ∈ S has a substring of length ` with Hamming distance at most d to t. We say t is the closest substring to S. For ` = n, this problem is known as the Closest String Problem (CSP). Particularly in computational biology, the CSP and CSSP have found numerous practical applications such as identifying regulatory motifs and approximate gene clusters, and in degenerate primer design. We study ILP formulations for both problems. Our experiments show that a position-based formulation for the CSP performs very well on real-world instances emerging from biology. Even on randomly generated instances that are hard to solve to optimality, solving the root relaxation leads to solutions very close to the optimum. For the CSSP we give a new formulation that is polytope-wise stronger than a straightforward extension of the CSP formulation. Furthermore we propose a strengthening constraint class that speeds up the running time.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

More Efficient Algorithms for Closest String and Substring Problems

The closest string and substring problems find applications in PCR primer design, genetic probe design, motif finding, and antisense drug design. For their importance, the two problems have been extensively studied recently in computational biology. Unfortunately both problems are NP-complete. Researchers have developed both fixed-parameter algorithms and approximation algorithms for the two pr...

متن کامل

A Meta Heuristic Solution for Closest String Problem Using Ant Colony System

Suppose ∑ is the alphabet set and S is the set of strings with equal length over alphabet ∑. The closest substring problem seeks for a substring over ∑ that minimizes the maximum hamming distance with other substrings in S. The closest substring problem is NP-complete. This problem has particular importance in 180 F. Bahredar et al computational biology and coding theory. In this paper we prese...

متن کامل

0 00 20 12 v 1 1 7 Fe b 20 00 On The Closest String and Substring Problems ∗

The problem of finding a center string that is ‘close’ to every given string arises and has many applications in computational molecular biology and coding theory. This problem has two versions: the Closest String problem and the Closest Substring problem. Assume that we are given a set of strings S = {s1, s2, . . . , sn} of strings, say, each of length m. The Closest String problem [1, 2, 4, 5...

متن کامل

Kernel Lower Bounds on String Problems

In Closest String problem we are given a set of strings S = {s1, s2, . . . , sk} over an alphabet Σ such that |si| = n and an integer d. The objective is to check whether there exists a string s over Σ such that dH(s, si) ≤ d, i ∈ {1, . . . , k}, where dH(x, y) denotes the number of places strings x and y differ at. Closest String is a prototype string problem. This problem together with severa...

متن کامل

On The Parameterized Intractability Of Motif Search Problems

We show that Closest Substring, one of the most important problems in the field of consensus string analysis, is W[1]-hard when parameterized by the number k of input strings (and remains so, even over a binary alphabet). This is done by giving a “strongly structure-preserving” reduction from the graph problem Clique to Closest Substring. This problem is therefore unlikely to be solvable in tim...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011